NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Leveraging family data to design Mendelian Randomization that is provably robust to population stratification

https://doi.org/10.1101/gr.277664.123

LaPierre, Nathan; Fu, Boyang; Turnbull, Steven; Eskin, Eleazar; Sankararaman, Sriram (May 2023, Genome Research)

Mendelian Randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases due to weak instruments as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We demonstrate in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, while standard MR methods yield inflated false positive rates. We then conducted an exploratory analysis of MR-Twin and other MR methods applied to 121 trait pairs in the UK Biobank dataset. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, while MR-Twin is immune to this type of confounding, and that MR-Twin can help assess whether traditional approaches may be inflated due to confounding from population stratification.
more » « less
Full Text Available
Leveraging pleiotropy for joint analysis of genome-wide association studies with per trait interpretations

https://doi.org/10.1371/journal.pgen.1010447

Taraszka, Kodi; Zaitlen, Noah; Eskin, Eleazar (November 2022, PLOS Genetics)
Epstein, Michael P. (Ed.)
We introduce pleiotropic association test (PAT) for joint analysis of multiple traits using genome-wide association study (GWAS) summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect. Additionally, simulations comparing PAT to three multi-trait methods, HIPO, MTAG, and ASSET, show PAT identified 15.3% more omnibus associations over the next best method. When these associations were interpreted on a per trait level using m-values, PAT had 37.5% more true per trait interpretations with a 0.92% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT discovered 22,095 novel variants. Through the m-values interpretation framework, the number of per trait associations for two traits were almost tripled and were nearly doubled for another trait relative to the original single trait GWAS.
more » « less
Full Text Available
Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy

https://doi.org/10.1038/s41467-022-28553-9

Cinelli, Carlos; LaPierre, Nathan; Hill, Brian L.; Sankararaman, Sriram; Eskin, Eleazar (December 2022, Nature Communications)

Abstract Mendelian Randomization (MR) studies are threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large databases. Here we describe a suite of sensitivity analysis tools that enables investigators to quantify the robustness of their findings against such validity threats. Specifically, we propose the routine reporting of sensitivity statistics that reveal the minimal strength of violations necessary to explain away the MR results. We further provide intuitive displays of the robustness of the MR estimate to any degree of violation, and formal bounds on the worst-case bias caused by violations multiple times stronger than observed variables. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings by examining the effect of body mass index on diastolic blood pressure and Townsend deprivation index.
more » « less
Full Text Available
Accurate modeling of replication rates in genome-wide association studies by accounting for Winner’s Curse and study-specific heterogeneity

https://doi.org/10.1093/g3journal/jkac261

Zou, Jennifer; Zhou, Jinjing; Faller, Sarah; Brown, Robert P.; Sankararaman, Sriram S.; Eskin, Eleazar; Matise, ed., T. (October 2022, G3 Genes|Genomes|Genetics)

Abstract Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex human traits, but only a fraction of variants identified in discovery studies achieve significance in replication studies. Replication in genome-wide association studies has been well-studied in the context of Winner’s Curse, which is the inflation of effect size estimates for significant variants due to statistical chance. However, Winner’s Curse is often not sufficient to explain lack of replication. Another reason why studies fail to replicate is that there are fundamental differences between the discovery and replication studies. A confounding factor can create the appearance of a significant finding while actually being an artifact that will not replicate in future studies. We propose a statistical framework that utilizes genome-wide association studies and replication studies to jointly model Winner’s Curse and study-specific heterogeneity due to confounding factors. We apply this framework to 100 genome-wide association studies from the Human Genome-Wide Association Studies Catalog and observe that there is a large range in the level of estimated confounding. We demonstrate how this framework can be used to distinguish when studies fail to replicate due to statistical noise and when they fail due to confounding.
more » « less
MARS: leveraging allelic heterogeneity to increase power of association testing

https://doi.org/10.1186/s13059-021-02353-8

Hormozdiari, Farhad; Jung, Junghyun; Eskin, Eleazar; J. Joo, Jong Wha (December 2021, Genome Biology)
null (Ed.)
Abstract In standard genome-wide association studies (GWAS), the standard association test is underpowered to detect associations between loci with multiple causal variants with small effect sizes. We propose a statistical method, Model-based Association test Reflecting causal Status (MARS), that finds associations between variants in risk loci and a phenotype, considering the causal status of variants, only requiring the existing summary statistics to detect associated risk loci. Utilizing extensive simulated data and real data, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while controlling the type I error.
more » « less
Full Text Available
RNA-seq data science: From raw data to effective interpretation

https://doi.org/10.3389/fgene.2023.997383

Deshpande, Dhrithi; Chhugani, Karishma; Chang, Yutong; Karlsberg, Aaron; Loeffler, Caitlin; Zhang, Jinyang; Muszyńska, Agata; Munteanu, Viorel; Yang, Harry; Rotman, Jeremy; et al (March 2023, Frontiers in Genetics)

RNA sequencing (RNA-seq) has become an exemplary technology in modern biology and clinical science. Its immense popularity is due in large part to the continuous efforts of the bioinformatics community to develop accurate and scalable computational tools to analyze the enormous amounts of transcriptomic data that it produces. RNA-seq analysis enables genes and their corresponding transcripts to be probed for a variety of purposes, such as detecting novel exons or whole transcripts, assessing expression of genes and alternative transcripts, and studying alternative splicing structure. It can be a challenge, however, to obtain meaningful biological signals from raw RNA-seq data because of the enormous scale of the data as well as the inherent limitations of different sequencing technologies, such as amplification bias or biases of library preparation . The need to overcome these technical challenges has pushed the rapid development of novel computational tools, which have evolved and diversified in accordance with technological advancements, leading to the current myriad of RNA-seq tools. These tools, combined with the diverse computational skill sets of biomedical researchers, help to unlock the full potential of RNA-seq. The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon.
more » « less
Full Text Available
Identifying causal variants by fine mapping across multiple studies

https://doi.org/10.1371/journal.pgen.1009733

LaPierre, Nathan; Taraszka, Kodi; Huang, Helen; He, Rosemary; Hormozdiari, Farhad; Eskin, Eleazar (September 2021, PLOS Genetics)
Zeggini, Eleftheria (Ed.)
Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).
more » « less
Full Text Available
Bruins-in-Genomics: Evaluation of the impact of a UCLA undergraduate summer program in computational biology on participating students

https://doi.org/10.1371/journal.pone.0268861

Coller, Hilary A.; Beggs, Stacey; Andrews, Samantha; Maloy, Jeff; Chiu, Alec; Sankararaman, Sriram; Pellegrini, Matteo; Freimer, Nelson; Johnson, Tracy; Papp, Jeanette; et al (May 2022, PLOS ONE)
Kakulapati, Vijayalakshmi (Ed.)
Recruiting, training and retaining scientists in computational biology is necessary to develop a workforce that can lead the quantitative biology revolution. Yet, African-American/Black, Hispanic/Latinx, Native Americans, and women are severely underrepresented in computational biosciences. We established the UCLA Bruins-in-Genomics Summer Research Program to provide training and research experiences in quantitative biology and bioinformatics to undergraduate students with an emphasis on students from backgrounds underrepresented in computational biology. Program assessment was based on number of applicants, alumni surveys and comparison of post-graduate educational choices for participants and a control group of students who were accepted but declined to participate. We hypothesized that participation in the Bruins-in-Genomics program would increase the likelihood that students would pursue post-graduate education in a related field. Our surveys revealed that 75% of Bruins-in-Genomics Summer participants were enrolled in graduate school. Logistic regression analysis revealed that women who participated in the program were significantly more likely to pursue a Ph.D. than a matched control group (group x woman interaction term of p = 0 . 005 ). The Bruins-in-Genomics Summer program represents an example of how a combined didactic-research program structure can make computational biology accessible to a wide range of undergraduates and increase participation in quantitative biosciences.
more » « less
Full Text Available
Metalign: efficient alignment-based metagenomic profiling via containment min hash

https://doi.org/10.1186/s13059-020-02159-0

LaPierre, Nathan; Alser, Mohammed; Eskin, Eleazar; Koslicki, David; Mangul, Serghei (December 2020, Genome Biology)
null (Ed.)
Abstract Metagenomic profiling, predicting the presence and relative abundances of microbes in a sample, is a critical first step in microbiome analysis. Alignment-based approaches are often considered accurate yet computationally infeasible. Here, we present a novel method, Metalign, that performs efficient and accurate alignment-based metagenomic profiling. We use a novel containment min hash approach to pre-filter the reference database prior to alignment and then process both uniquely aligned and multi-aligned reads to produce accurate abundance estimates. In performance evaluations on both real and simulated datasets, Metalign is the only method evaluated that maintained high performance and competitive running time across all datasets.
more » « less
Full Text Available
PLEIO: a method to map and interpret pleiotropic loci with GWAS summary statistics

https://doi.org/10.1016/j.ajhg.2020.11.017

Lee, Cue Hyunkyu; Shi, Huwenbo; Pasaniuc, Bogdan; Eskin, Eleazar; Han, Buhm (January 2021, The American Journal of Human Genetics)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records